Today I’m exploring the data on Fatal Car Crashes on 4/20 provided by TidyTuesday.

The Data

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
daily_accidents_420 <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-04-22/daily_accidents_420_time.csv')
## Rows: 23086 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (1): fatalities_count
## lgl  (1): d420
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
DT::datatable(daily_accidents_420 %>% rename_with(~c("Date", "d420", "Fatalities Count")))
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html

First, I’ll prepare a new table to do further analysis.

table1 = daily_accidents_420  %>%
  pivot_wider(
    names_from = d420,
    values_from = fatalities_count,
  ) %>%
  rename(d420_false = `FALSE`, d420_true = `TRUE`, d420_na = `NA`) %>%
  mutate(year = year(date), month = month(date, label = TRUE), day = day(date)) %>% 
  rowwise() %>%
  mutate(d_total = sum(d420_false, d420_true, ifelse(is.na(d420_na),0,d420_na)))
DT::datatable(table1 %>% rename_with(~c("Date", "d420_false", "d420_true", "d420_na", "Year", "Month", "Day", "Fatalities Count")), class = 'cell-border stripe')

Graph

ggplot(daily_accidents_420, aes(x = date, y = fatalities_count, color = d420)) + 
  geom_point(aes(color = d420, shape = d420)) +
  labs(
    title = "Fatalities Count",
    color = "Accident between 4:20pm and 11:59pm", 
    shape = "Accident between 4:20pm and 11:59pm",
    x = "Date", 
    y = "Fatalities",
    tag = "Figure 1"
  ) +
  theme(legend.box = "horizontal", legend.position = "bottom")
## Warning: Removed 4822 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(daily_accidents_420, aes(x = date, y = fatalities_count, color = d420)) + 
  geom_smooth(aes(color = d420)) + 
  geom_smooth(method = "lm", aes(x = date, y = fatalities_count, color = "Overall Fatalities")) + 
  labs(
    title = "Fatalities Trend",
    color = "Accident between 4:20pm and 11:59pm", 
    x = "Date", 
    y = "Fatalities",
    tag = "Figure 2"
  ) +
  theme(legend.box = "horizontal", legend.position = "bottom")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using formula = 'y ~ x'

According to the Figures 1 and 2, accidents that didn’t happened between 4:20 and 11:59 PM on any given day declined between 2005 and 2010, but began to rise again in around 2015. Similarly the accidents that happened between 4:20 and 11:59 PM on any given day also declined between 2005 and 2010 while having slow increase till 2005, and had a incline around the year 2015.

While, the overall fatalities shows a slow decline in trend.

p = ggplot(table1, aes(x = year, y = month, fill = d_total)) + 
  geom_tile() + 
  scale_fill_gradient(low="white", high="red") +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  labs(
    title = "Heatmap of accidents cooresponding to Year and Month",
    fill = "Fatalities",
    x = "Year", y = "Month",
    tag = "Figure 3"
  ) + 
  theme_classic(base_size = 8, base_family = "JetBrains Mono")

plotly::ggplotly(p)
## Warning in matrix(g$fill_plotlyDomain, nrow = length(y), ncol = length(x), :
## data length [9132] is not a sub-multiple or multiple of the number of columns
## [25]
## Warning in matrix(g$hovertext, nrow = length(y), ncol = length(x), byrow =
## TRUE): data length [9132] is not a sub-multiple or multiple of the number of
## columns [25]
DT::datatable(
  table1 %>%
    arrange(desc(d_total)) %>%
    select(year,month,day,d_total) %>%
    rename("Fatalities Count" = d_total)  
)

Bibliography

[1]. Harper S, Palayew A “The annual cannabis holiday and fatal traffic crashes.” BMJ Injury Prevention. Published Online First: 29 January 2019. doi: 10.1136/injuryprev-2018-043068. Manuscript and original data/code at https://osf.io/qnrg6/

[2]. Staples JA, Redelmeier DA. “The April 20 cannabis celebration and fatal traffic crashes in the United States.” JAMA Intern Med. 2018 Feb; 178(4):569–72.

[3]. Harmon J, Hughes E (2024). tidytuesdayR: Access the Weekly ‘TidyTuesday’ Project Dataset. doi:10.32614/CRAN.package.tidytuesdayR https://doi.org/10.32614/CRAN.package.tidytuesdayR, R package version 1.1.2, https://CRAN.R-project.org/package=tidytuesdayR.